Linear regression introduction

Specify our own trend line. By setting up the intercept and the slope.

Generate some input values.

Calculate the mean or average trend of the response.

Visualize the mean trend with respect to the input x.

Generate random observations around the mean trend.

Specify the amount of noise in our example.

Visualize the data.

If we go out and collect data, read data in from a data base, or download data from a CSV file, we will NEVER know the true trend.

Repeat generating the random numbers.

Reshape into long-format.

Visualize the 6 different sets of random numbers.

Let's use sns.lmplot() to draw a best fit line to the data.

How do we fit models?

What would happen if I tried a different slope?

How can we pick a best line?

Using a nested for loop to create a grid of intercept and slope values.

Calculate the mean trend for each combination of the intercept and slope.

Apply our function to every intercept and slope combination.

Visualize the behavior of all of the guesses with seaborn.

Calculate the error.

Summarize the squared error.

Look at the best models and compare to the data.

What if we tried out even more combinations?

Fit with built in functions

https://www.statsmodels.org/stable/index.html